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ABSTRACT: Computational modeling has been adopted in all aspects of drug research and 
development, from the early phases of target identification and drug discovery to the late-stage clinical 
trials. The different questions addressed during each stage of drug R&D has led to the emergence of 
different modeling methodologies. In the research phase, systems biology couples experimental data 
with elaborate computational modeling techniques to capture lifecycle and effector cellular functions 
(e.g. metabolism, signaling, transcription regulation, protein synthesis and interaction) and integrates 
them in quantitative models. These models are subsequently used in various ways, i.e. to identify 
new targets, generate testable hypotheses, gain insights on the drug's mode of action (MO A), translate 
preclinical findings, and assess the potential of clinical drug efficacy and toxicity. In the development 
phase, pharmacokinetic /pharmacodynamic (PK/PD) modeling is the established way to determine safe 
and efficacious doses for testing at increasingly larger, and more pertinent to the target indication, cohorts 
of subjects. First, the relationship between drug input and its concentration in plasma is established. 
Second, the relationship between this concentration and desired or undesired PD responses is ascertained. 
Recognizing that the interface of systems biology with PK/PD will facilitate drug development, systems 
pharmacology came into existence, combining methods from PK/PD modeling and systems engineering 
explicitly to account for the implicated mechanisms of the target system in the study of drug-target 
interactions. Herein, a number of popular system biology methodologies are discussed, which could be 
leveraged within a systems pharmacology framework to address major issues in drug development. 
© 2013 The Authors. Biopharmaceatics & Drug Disposition published by John Wiley & Sons, Ltd. 
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Introduction 

Computational modeling has been adopted in all 
aspects of drug research and development, from 
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the early stages of target identification and drug 
discovery, up to phase II-IV of clinical trials. In 
the early stages, computational modeling in the 
form of systems biology, fueled by recent advances 
in 'omics' technologies, constructs predictive 
models to integrate major cellular functions and 
monitor how these are altered in disease; identifies 
new drug targets; predicts the potential of clinical 
efficacy and toxicity of uncharacterized compounds 
and probes their mode of action (MO A). Systems 
biology employs elaborate methodologies to 
exploit and integrate prior knowledge of the 
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interrogated system (disease, cell type, a specific 
tissue, etc. and their interactions with compounds 
of interest) and extracts qualitative or quantitative 
features that will lead to its better understanding, 
eventually facilitating drug development. The 
preclinical and clinical development, on the other 
hand, follows a completely different paradigm. 
Scientists, clinical pharmacologists and physicians 
have to decide on issues such as: What is the 
optimal drug candidate to advance in the next 
phase of clinical trials? What dose /regimen will 
maximize efficacy and minimize toxicity? What 
are the right patients to treat with the drug? 
Because of the nature and consequences of these 
questions and because of the existing staged 
paradigm of drug development [1], PK/PD 
modeling, an empirical data-driven approach based 
on cardinal pharmacological principles [2], has 
gained wide acceptance: PK/PD modeling first 
establishes the relationship between drug input 
and its concentration in plasma and second, estab- 
lishes the relationship between this drug plasma 
concentration and a desired or undesired response, 
be it a biomarker, a clinical endpoint or an adverse 
event, e.g. tumor growth or cell apoptosis. 

Traditionally, systems biology and PK/PD 
modeling existed in 'parallel universes' [3]. 
However, it is becoming evident that their role is 
complementary and that a lot could be gained 
by their integration. For example, PK/PD 
overlooks pertinent biological network aspects 
such as the molecular basis of the disease, the 
way in which cellular processes are orchestrated 
by intricate signaling mechanisms, and how these 
mechanisms cross-react. This severely limits the 
general applicability of the model and disallows 
extrapolations and wider pharmacological and 
biological insights. On the other hand, PK/PD is a 
relatively tractable and pharmacologically sound 
method that manages to tame and quantify the 
uncertainty around the point estimates of the 
model parameters as well as the inter-individual/ 
biological and unexplained variability inherent in 
the data. Thus, it can quantify the dose-exposure- 
response relationship and assess the robustness 
of the model. The latter is not always true in 
systems biology. Systems biology figures out the 
qualitative and quantitative aspects of the 
molecular mechanisms of disease by leveraging 
extensive and diverse data from in vitro 
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experiments, but that imposes limitations, especially 
when there is limited understanding of how these 
data are incorporated in the physiology of the 
target organism and how they affect its clinical 
response. Thus, depending on the system under 
study, systems biology approaches may lead 
to intractable or ill-defined models with low 
confidence on their parameters making their use 
for robust quantitative predictions of clinical 
response risky. Systems pharmacology emerged 
to form this exact interface between PK/PD and 
systems biology [4,5]. It combines methods from 
PK/PD modeling and systems engineering 
explicitly to account for the implicated mecha- 
nisms of the target system in the study of drug- 
target interactions, albeit in a tractable, robust 
way. In more detail, instead of ignoring the 
biology of the target organism and focusing only 
on the data-driven correlation of drug exposure 
and clinical response, systems pharmacology 
employs mechanistic methods to capture key 
properties of the system (e.g. focused biology 
around the target or biomarker), increasing the 
applicability and relevance of the model. 

Herein, we first present the current PK/PD 
methodologies applied in drug development and 
then discuss a number of systems biology ap- 
proaches that could be leveraged within a systems 
pharmacology framework to facilitate: (i) the 
modeling of signaling pathways, (ii) identification 
of drug MO A and (iii) prediction of clinical drug 
efficacy and toxicity. 

Computational Modeling in the Clinic 

In the clinical phase of drug development, compu- 
tational modeling takes place mostly in the form 
of PK/PD. This is now a well-defined discipline 
explained in established textbooks [2,6], thus, 
the technical details will not be covered here. 
Physiologically based pharmacokinetic (PBPK) 
modeling, an approach lying between standard 
PK modeling and systems pharmacology, warrants 
a brief discussion. PBPK uses compartments that 
correspond to specific tissues /organs and models 
drug distribution between them in a physiologi- 
cally realistic manner using the cardiovascular 
system [7,8]. Moreover, recent advancements 
in the predictability of key pharmacokinetic 
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parameters from human in vitro data and in the 
availability of dedicated software platforms and 
associated databases, allowed PBPK to construct 
even more detailed models, predicting the time 
dependent plasma concentration of the drug, 
drug-drug interactions and the effects of age, 
genetics and disease to the kinetics of the drug. 

Recently, a few mechanistic models have been 
proposed that capture the processes governing 
the transduction of target activation into the 
response in vivo [9]. These models employ concepts 
from dynamic systems analysis (such as ordinary 
differential equations (ODE)s modeling) to model 
signaling cascades or even homeostatic feedback 
(transduction models). 

It has become clear in recent years that PK/PD 
models are evolving to account for the mecha- 
nisms of the target organism [4]. Systems pharma- 
cology works in this direction, attempting to 
incorporate methodologies from systems engi- 
neering and systems biology with PK/PD model- 
ing to widen its applicability and to facilitate the 
prediction of drug action [4,5]. Systems pharma- 
cology in the preclinical phase could aid in the 
design of clinical companion tests to uncover 
sensitivity (or resistance) to different drugs; or 
screen high-risk drugs out of the development 
pipeline as early as possible, de-risking programs 
ahead of phase II and saving valuable resources. 
Unfortunately, the clinical applicability of systems 
approaches is constrained by the limited sample 
availability. In contrast to the discovery phase where 
extensive in vitro experiments can be carried out, 
during clinical development computational models 
are forced to work with a few biomarkers expressed 
in blood (e.g. cytokine releases in the bloodstream) 
and only occasionally with biopsy samples. 
Moreover, biomarkers in blood are typically charac- 
terized by a low signal to noise ratio, resulting in 
obscure predictions. A workaround to some of these 
limitations may involve the proteomic technologies 
that provide high content data with minimum 
sample requirements, such as the xMAP technology, 
protein microarrays and flow cytometry (see Box 1). 
Some of them have already been used in the clinic, 
mostly in the quest for personalized medicine. 
Leveraging these technologies within a systems 
pharmacology framework could provide the 
experimental data necessary for the construction of 
more detailed and biologically relevant models. 



Box. 1. Phosphoproteomic technologies. 

Phosphoproteomic technologies play a key role in computational 
modeling for drug development, since they provide high content 
data at the level where most of the modern drugs act [49]. As 
a result, all forms of mechanistic modeling for the study of 
signaling pathways and the identification of drug MOA, 
leverages phosphoproteomic data. The most commonly employed 
phosphoproteomic technologies in drug development, that are also 
suitable for the clinical phase and the analysis of biopsy samples 
are protein microarrays, xMAP technology and flow cytometry. 

Protein microarrays [34,35] typically employ a glass slide, on 
top of which the capture antibody is spotted. Then, the sample 
bathes the slide and proteins from the sample are immobilized 
on the matching antibodies. At the final step of the assay, a 
secondary antibody (which is biotinylated) bathes the slide and 
binds to the corresponding proteins. The amount of proteins on 
the slide is measured by a plate reader, providing an estimate of 
protein abundance; while, if one of the two antibodies is anti- 
phospho, the phosphorylation of the protein is estimated instead. 
The use of robotic systems to spot the antibodies on the slide 
allows microarrays to measure up to 1000 proteins per sample, 
essentially making antibody availability the limiting factor of this 
platform. A variation of protein microarrays called Reverse Phase 
Protein Arrays (RPPA) employs the spotting of the sample on the 
glass slide; then an antibody solution, which is biotinylated, bathes 
the slide and binds to the matching proteins. In this manner 
thousands of samples can be screened at the same time but 
measuring only a single signal. 

xMAP technology [36,64], is an antibody-based, suspension 
array technology. xMAP employs polystyrene beads as a 
substrate, on top of which capture antibodies are coupled; the 
beads are color-coded, so every bead color corresponds to a 
different capture antibody (signal). The sample is plated on 96 or 
384 well plates and beads of different colors are multiplexed and 
suspended with the sample. Proteins from the sample bind 
to the capture antibody on the bead surface, while the rest of 
them are washed away. Then, a secondary antibody, which is 
biotinylated, is introduced and binds to the immobilized 
proteins, thus completing a sandwich assay. Then another 
washing step follows and the bead-protein-secondary 
antibody construct goes through the xMAP detection system 
in which two lasers are used. One excites the bead's red color 
and one excites the fluorophore's green color, two 
photomultipliers collect the emissions and provide an 
estimate of protein abundance for each signal. With xMAP 
technology, up to 30 signals may be measured in each well, 
providing high sample and signal throughput. 

Flow cytometry [37] is a single cell technology that employs 
the suspension of cells, properly labeled with fluorescent 
chemicals, in a stream of liquid going through a detection 
system. The detection system uses a laser to excite the 
fluorophore and a photomultiplier measures the emitted 
signature. To measure phosphorylation activity, phospho- 
specific fluorescent antibodies are used that bind to the target 
proteins. If more than one signals are to be quantified at the same 
time, the fluorophores must emit in different wavelengths 
(colors). Using polychromatic flow cytometry, Perez et ah in [37] 
measured a total of 11 proteins (members of the MAPK family) 
in both artificially and physiologically perturbed peripheral 
blood mononuclear cells (PBMCs). 
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Computational Modeling in the 
Discovery Phase 

Computational modeling in the discovery phase 
employs elaborate methodologies to exploit prior 
knowledge of the interrogated system (disease, 
cell type, a specific tissue, a compound of interest, 
etc.) and extracts qualitative features that will 
lead to its better understanding and ultimately 
facilitate drug development. The employed 
methodologies can be broken down into two 
classes: (i) the data driven methods and (ii) the 
mechanism driven methods. Data driven methods 
exploit extensive datasets and use straightforward 
approaches to extract interpretable features of 
the interrogated system. These approaches include 
mostly machine learning algorithms (clustering 
analysis, classification, Bayesian inference), regres- 
sion methods, methods from information theory 
(mutual information) or optimization algorithms. 
These methods are agnostic to the underlying 
biology since they ignore the mechanisms that 
define system behavior. Instead they trust the 
experimental data in capturing all relevant 
information on the interrogated system and focus 
on modeling these data. On the other hand, mech- 
anism driven methods employ a mathematical 
formalism to model an often elementary process 
of the system (e.g. signal transduction from one 
phosphoprotein to the next, metabolism of one 
substance to another, expression of a gene from 
a transcription factor, etc.) and then integrate all 
these processes in a computable model such as 
the signaling pathway downstream of a receptor 
of interest, a metabolic pathway, or a gene 
expression network. Typically, data driven 
methods are employed to construct predictive 
models on a higher system level (e.g. gene regula- 
tory networks numbering tens of thousands of 
nodes) but not very detailed. Mechanism driven 
models are used to construct more detailed 
models, albeit around a narrow region of interest. 
The trade-off between relevance and tractability 
dictates that the more detailed the model, the 
narrower the region. The reasons for this discrim- 
ination is, first, that mechanism driven methods 
require high complexity data for the interrogated 
system (e.g. perturbation data, many time points 
etc.), implying a large number of samples, that 
very often come at the cost of a small number of 
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measured signals; and second, the parameter 
estimation problem that eventually has to be solved, 
becomes very challenging (computationally) for 
large models. On the other hand, data driven 
methods are computationally simpler and do not 
require such complex, multi-dimensional data, 
allowing the construction of models on a whole 
systems level. In the following paragraphs we 
review methodologies that address: (i) the 
modeling of signaling pathways, (ii) identifica- 
tion of drug MOA and (iii) prediction of clinical 
drug efficacy and toxicity; and discuss how these 
could be leveraged in a systems pharmacology 
framework to facilitate drug development in 
the clinic. 

Modeling signal transduction pathways by 
leveraging experimental data 

Modeling of signaling pathways refers to the 
process of identifying relationships that describe 
how signal propagates from one protein to the 
next, ultimately explaining the way cells respond 
to factors of their biochemical microenvironment 
[10]. The study of signaling pathways is of the 
utmost importance in both the discovery and the 
clinical phase of drug development, however, it 
takes place very differently in the two phases. In 
the discovery phase, typically in vitro data are 
used either to construct mechanistic models that 
describe in detail how signal transduction takes 
place in the cell type /tissue of interest and how 
this is affected by disease, or to construct less 
detailed but more extensive models, integrating 
dozens of pathways that orchestrate all major 
cellular processes. In the clinical phase, PD model- 
ing of biomarkers mostly expressed in blood and 
occasionally in biopsy samples, aims at the identifi- 
cation of signaling events only at the very narrow 
region of the pathway where the interrogated drug 
is expected to act (in the neighborhood of a few 
predefined, accessible biomarkers). For the study 
of this region, mechanistic PD modeling employs 
detailed methods such as ODE modeling and 
succeeds in capturing the dynamics of the impli- 
cated reactions; it then correlates the expression of 
these biomarkers with clinical endpoints. For 
example, Ramakrishnan et al. [11] built an ODE 
model to study the pharmacodynamic effects of 
methylprednisolone, as a series of events initiating 
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at the cytosolic glucocorticoid receptor, going 
through the heat shock proteins to the nucleus 
and resulting to the enhanced expression of STAT, 
with STAT being the ultimate pharmacodynamics 
endpoint. 

Pharmacodynamic modeling often fails to see 
the general context and broader processes that 
these biomarkers are a part of, and regulated by. 
In the following paragraphs we present methodol- 
ogies for the modeling of signaling pathways 
applied in early stages of drug development, but 
also suitable for the clinical phase. 

The methodologies for pathway modeling can be 
broken down into two classes: (i) Data inference 
methods, and (ii) mechanistic methods [12]. 

Data inference methods, typically employ principal 
component analysis (PCA), partial least squares 
regression (PLSR), clustering, self-organizing 
maps and network inference algorithms such as 
mutual information (MI) and Bayesian inference, 
to identify cross-talks between the signaling 
molecules in the pathway, as these are captured 
in the experimental data at hand. They do not 
require any form of prior knowledge of the 
proteins' connectivity for their implementation, 
but extract all their predictions based on the train- 
ing dataset. In more detail, methods such as PLSR 
and PCA perform dimensionality reduction on the 
experimental dataset by projecting it to the dimen- 
sions of maximum variation. This facilitates data 
interpretation and the identification of qualitative 
trends in the signaling process [13]. PLSR also 
performs regression, correlating a perturbations 
matrix X with the signaling dataset Y, identifying 
in this manner the features of the perturbation 
matrix (stimuli, drugs etc.) that best explain the 
variance in the measurements [14]. Clustering 
and self-organizing maps employ a distance 
metric to identify signals in the experimental 
dataset that respond in similar fashion across all 
samples. Then a threshold is introduced above 
which these similarities imply an interaction 
between the two signals [12,15]. Mutual informa- 
tion instead of using distance metrics uses an 
integral function of the joint probability of any 
two signals over all samples, to calculate the 
dependency between them. A threshold is then 
introduced, in similar fashion to the clustering 
methods, above which the dependency of the 
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two signals is considered strong enough to imply 
an interaction between them [16]. Finally, 
Bayesian inference is one of the most powerful 
methods for data inference. It employs the Bayes 
rule to model the probability of an arbitrary signal 
to be active as a function of its upstream signals 
[17,18]. These probabilities in the Bayesian 
framework are called conditional probability 
distributions (CPDs) and essentially capture the 
proteins' connectivity in the signaling pathway. 
The CPDs can be learned from the data using a 
training algorithm such as the expectation 
maximization algorithm. 

Mechanistic methods, implement a mathematical 
formalism to model how signal propagates from 
one protein to the next within the signaling 
pathway. Typically these formalisms include that 
of ODEs, some form of logic modeling (such as 
Boolean logic or constrained fuzzy logic), or 
employ custom rules to model the signaling 
pathway as an interaction graph. Either of these 
formalisms essentially translates the pathway 
from an abstract graphical representation of 
protein interactions (as obtained from the litera- 
ture) into an executable model of the cell's signal- 
ing mechanisms, capable of simulating the signal 
flow from the receptor level, through the several 
kinases implicated in the signaling process, all 
the way into the transcription level. Mechanistic 
models offer the clear advantage of in silico 
experiments, i.e. what will the signaling process be 
like if I stimulated the cells with a growth factor, or 
with a candidate drug? The difference between the 
various formalisms lies in their perception of the 
signal transduction processes. The ODEs are one 
of the most detailed formalisms, employing the 
law of mass action kinetics to calculate the 
proteins' activation state over time [19,20]. 
Boolean logic assumes binary (0/1) values for 
the activation state of the included proteins and 
uses logic gates (AND/OR/NOT) to model the 
proteins' connectivity in the pathway. Then, signal 
flow is simulated by imposing boundary 
conditions on input nodes (receptors, or targets 
of compounds) and by propagating the signal 
downstream via the logic gates [21-25]. 
Constrained fuzzy logic also employs logic gates 
(AND /OR/ NOT) but also incorporates a transfer 
function (typically a sigmoid curve) to calculate 
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the activation state of a given node as a function of 
the activation state of its upstream nodes [26,27]. 
Finally, the simplest representation of a signaling 
pathway is that of a graph model and in 
particular, that of a signed directed graph. In 
signed directed graphs each edge indicates either 
a positive or a negative effect of one node upon 
another. In the work by Melas et al. (Detecting and 
removing inconsistencies between experimental 
data and signaling network topologies using 
integer linear programming on interaction graphs, 
accepted in PLoS Comp Biol, 2013), a set of rules is 
proposed based on the definition of the signed 
directed graphs that models signal transduction 
from one node to the next, implemented as a set 
of linear constraints. 

In principle, mechanistic methods could be 
applied without using experimental data, their 
predictive power, however, is limited by the 
accuracy of the proteins' connectivity in the 
signaling pathways used as a scaffold, on top of 
which the models are built. Thus, if the signaling 
pathways (as obtained from the literature) 
represent the proteins' connectivity inaccurately, 
then the resultant models will yield erroneous 
predictions. To capture the true signaling motifs 
of the interrogated cell type, experimental data 
are usually incorporated and a training algorithm 
is employed to calibrate the model to best fit the 
data at hand. On this front, significant work has 
been published using (i) optimization algorithms, 
such as genetic algorithms or regular optimization 
formulations and (ii) sensitivity analysis. 

Regarding optimization formulations, in the 
work by Saez-Rodriguez et al. [28,29], a signaling 
pathway was put together downstream of six 
receptors of interest, based on literature citations 
of protein interactions, and Boolean logic was 
used to model signal transduction in the pathway. 
Then, a genetic algorithm pruned the pathway 
by removing reactions that seemed to contradict 
high throughput experimental data. In the work 
by Mitsos et al. [30-32], an Integer Linear 
Programming formulation was introduced to 
prune the pathway so that it best fits the experimen- 
tal data at hand, diminishing the required CPU 
time, thus, allowing the interrogation of complex 
pathways and phosphoproteomic datasets. An ILP 
formulation was also used in the work by Melas 
et al. (Detecting and removing inconsistencies 
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between experimental data and signaling network 
topologies using integer linear programming on 
interaction graphs, accepted in PLoS Comp Biol, 
2013) to identify and remove inconsistencies 
between signaling pathway topologies and 
phosphoproteomic data. In addition to pruning the 
network, two more strategies were employed: (i) 
addition of de novo reactions and (ii) identification 
of minimum correction sets, defined as the 
minimum set of nodes that have to be corrected 
to obtain a perfect fit of the data. Other than 
optimization algorithms, methodologies based 
on sensitivity analysis are also used [33]. In these 
approaches, the connectivity of the proteins in 
the signaling pathway is inferred by considering 
infinitesimal changes in the activation of an 
arbitrary node A and monitoring changes in the 
activation of node B, while keeping the activation 
of other nodes constant. If the two nodes are co- 
regulated then an interaction may be present 
between them. 

Even though a few years ago the phosphoproteomic 
data needed to perform this type of analysis were 
not easy to obtain, recent advancements in high 
throughput proteomics technologies now allow 
the quantification of dozens of proteins per sample 
with minimum sample requirements. The three 
platforms most suitable for this endeavor are 
protein microarrays (or planar arrays) [34,35], the 
xMAP technology (or suspension arrays) [36] and 
flow cytometry [37] since they combine high signal 
and sample throughput (see Box 1). MassSpec on 
the other hand, is ideal for exploratory purposes 
(identify which proteins are expressed in a specific 
tissue etc.) but cannot be used in the clinic because 
of its sample requirements. 

The systematic modeling of signaling pathways, 
as implemented in the discovery phase, results in 
predictive models of the signaling mechanisms 
in the cell type/ tissue of interest. In particular, 
the various data inference algorithms and logic 
modeling, via leveraging high throughput 
proteomic data, succeed in integrating in predic- 
tive models a multitude of pathways, responsible 
for most major cellular functions. If applied 
within a systems pharmacology framework, these 
approaches could potentially uncover the 
signaling processes that take place in the patient 
and provide a systems framework for the 
interpretation of PD biomarker data. For example, 
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instead of only measuring the activation levels of a 
few biomarkers relative to the interrogated disease, 
without truly identifying the mechanisms that 
regulate their expression, one could also measure 
phosphoproteins that play a central role in cellular 
functions and construct extensive models that 
correlate them with the predefined biomarkers. In 
this way, a broader view of the mechanisms that 
regulate the expression of these biomarkers is 
obtained and the processes that govern clinical 
response may be deconvoluted. For example, 
Iadevaia et al. [38] constructed an ODE model based 
on phosphoproteomic data, to model signal 
transduction downstream of the IGF1 receptor 
and was then leveraged to identify optimal drug 
combinations for inhibiting cell proliferation. In 
addition to PD modeling, these approaches can 
also be used in personalized medicine. The 
construction of patient specific models may 
uncover mechanisms in the progression of the 
disease that differ from patient to patient, facilitat- 
ing the selection of optimal therapies. On this front, 
significant work has been published using reverse 
phase protein arrays (RPPA) [39,40]. For example, 
Wulfkuhle et al. [41] studied the role of ERK1/2 
pathway in ovarian cancer, demonstrating that 
patterns in signaling pathway activation in ovarian 
tumors may be patient-specific rather than stage- 
specific. Moreover, in the work by Ihle et al. [42], 
RPPA data were used in combination with 
transcriptomic data to study the effects of KRAS 
substitutions to protein behavior and how that af- 
fected signaling and clinical outcome. Finally, path- 
way modeling, applied longitudinally during the 
course of treatment, may uncover mechanisms of 
drug resistance [43] and facilitate the selection of 
the optimal frequency of administration. 

Identification of drug mode of action (MO A) 

Identification of drug mode of action (MOA) 
refers to the process of understanding how a drug 
affects signaling activity. Even though the binding 
affinities of the drug are well known from 
bioactivity assays performed at earlier phases of 
drug development, the functional effects of the 
drug on the signaling mechanisms of the target 
tissue are usually not fully characterized. As a 
result, off target effects may still be identified 
several years after the drug has been made available 
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to patients. A number of methods have been 
proposed for the identification of drug MOA that 
either employ extensive phosphoproteomic mea- 
surements to capture directly drug effects on the 
signaling level, or exploit public repositories, 
screening the effects of hundreds of drugs on cell 
lines and then use a machine learning algorithm 
to deconvolute these signatures and identify where 
the interrogated drug acts. Unfortunately, these 
approaches are generally not part of clinical 
development. In clinical trials the effects of the 
drug are usually evaluated with respect to a set 
of predefined PD and safety biomarkers and /or 
well-established clinical endpoints. However, 
adverse drug effects of low incidence may stay 
undetected until enough patients have been 
exposed to the drug. In the following paragraphs 
we present commonly used methodologies for 
the identification of drug MOA that could be 
leveraged within a systems pharmacology frame- 
work with great benefits. 

There are two classes of methods that have 
been proposed for the identification of adverse 
drug effects: (i) machine learning approaches, (ii) 
mechanistic approaches [44]. Machine learning 
approaches leverage extensive data repositories, 
capturing the effects of hundreds of compounds 
on different cell lines/patients, and a clustering 
or classification algorithm is employed to predict 
the effects of new (uncharacterized) drugs based 
on their signature and known targets of similar 
drugs [45-47]. Any type of data can be used in a 
machine learning framework including pheno- 
typic, transcriptomic, signaling, chemoproteomic, 
structural, or other data, as long as there are 
enough drugs available in the training dataset to 
guarantee the statistical significance of the predic- 
tions. For example, Campillos et al. [47] used drug 
side effects, as these are listed in package leaflets, 
to identify more than 1000 drug-drug relations 
between 746 marketed drugs. In this context, 
drug-drug relations refer to data driven predic- 
tions that show when two drugs share a target. 
Interestingly, 261 of these relations are between 
chemically dissimilar drugs, while a number of 
them were validated experimentally. In another 
application of machine learning approaches, Iorio 
et al. [46] exploited the cMAP repository [48], 
screening the effects more than 6000 compounds 
on the transcriptomic level of cell lines, to identify 
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more than 40000 relations between 1300 drugs. 
Iorio et al. used a distance metric to score the 
similarities in drug signatures and when two 
drugs demonstrated a statistically significant 
similarity, a drug-drug correlation was intro- 
duced. In another work by Gregori-Puigjane 
et al. [45], a chemoinformatics approach was used 
leveraging the DrugBank.ca repository to identify 
previously unreported mechanisms of action 
targets for drugs. 

Even though machine learning approaches 
make robust predictions, they do not provide 
insight into the exact mechanism of action of the 
interrogated drug. This is an inherent limitation of 
the use of gene expression data, phenotypic, 
structural, or other data in drug development, since 
most of the drugs act on the phosphoproteomic 
level; thus, any effects on the gene expression or 
other level are second order (i.e. indirect) effects 
[49]. Mechanistic approaches are based on 
phosphoproteomic data, circumventing this limita- 
tion. Mechanistic approaches use in vitro data in 
the presence or absence of the interrogated drug 
and either identify their differences, estimating in 
this manner the drug MOA directly [50-52], or 
construct models of the signaling pathway before 
and after drug treatment in order to identify drug 
induced model alterations. For example, Bantscheff 
et al. [50] used mass spectrometry (MS) to reveal 
mechanisms of action of clinical ABL kinase 
inhibitors, including unreported targets of imatinib. 
Mass spectrometry is the method of choice for these 
questions outside clinical development, since it 
allows the complete proteomic profiling of the 
sample (measuring up to 20000 proteins at the 
same time), essentially measuring the effect of 
the interrogated drug on the whole proteome of 
the target tissue. Unfortunately, MS cannot be 
used effectively in the clinical phase because of 
its sample requirements; on the other hand, 
protein microarrays, Luminex xMAP and flow 
cytometry can be used. These technologies 
provide much lower signal throughput compared 
with MS, however, their output data can be used 
for the construction of mechanistic models; these 
models may then be leveraged to identify drug 
effects on the signaling level. For example, Mitsos 
et al. [30] used Luminex xMAP to construct 
signaling models of HEPG2 cells upon treatment 
with four different hepatocellular carcinoma 
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drugs and then identified their differences with 
the control model (specific to HEPG2 cells), thus 
obtaining topology alterations caused by the drug. 
In this manner, previously unreported off-target 
effects were uncovered. Protein microarrays can 
also be used in this context. Their very low sample 
requirements make them an ideal approach for 
clinical applications and they have already been 
used extensively in the study of signaling 
pathways. Moreover, in a very interesting work 
by Kumar et al. [53], gene expression data were 
leveraged using causal reasoning to identify the 
MOA of a novel AKT kinase inhibitor 
(GSK690693). Causal reasoning is a methodology 
that infers patterns in the phosphoproteomic level 
that best explain the observed changes in the gene 
expression level [54]. 

The rigorous study of drug MOA, as presented 
above, has repeatedly identified unreported drug 
targets, even years after the drug has made it to 
the market. If applied in clinical development it 
could uncover fully the mechanism of action, 
possibly identifying off-target effects that would 
otherwise remain unnoticed and, thus, affect the 
outcome of subsequent clinical trials. In this 
manner the identification of drug MOA may 
decrease the attrition rate of clinical trials by 
advancing drug candidates with minimum adverse 
effects or help towards personalized medicine, 
where optimal combination therapies and drug 
dosage can be devised for each patient. Drug 
repurposing could also be another application. 

Prediction of clinical efficacy and toxicity 

Prediction of clinical efficacy and toxicity refers to 
the process of predicting the clinical response to a 
drug, in terms of validated endpoints, based on phe- 
notypic, transcriptomic, signaling, chemoproteomic, 
structural or other (preclinical) data. The prediction 
of efficacy /toxicity is a key issue in all phases of 
drug development. In the discovery phase, systems 
approaches either leverage extensive in vitro 
datasets via machine learning or clustering analy- 
sis, in an attempt to predict the clinical efficacy/ 
toxicity of uncharacterized drugs based on their 
signatures and known clinical outcomes of 
similar drugs; or construct predictive models to 
describe the signaling, metabolic or transcriptomic 
processes in the cell type/ tissue of interest and 
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uncover how these are altered in disease or 
orchestrate the expression of key toxicity mediators. 
In the latter case either in vitro or in vivo data can be 
used, with the in vivo data demonstrating clear 
advantages in terms of predictive power. Typically 
though, in vivo data availability is considerably 
lower than in vitro data due to cost. 

Regarding the data driven methodologies, in 
similar fashion to the identification of drug MOA 
(discussed above) where such methodologies are 
applied, any types of data can be used in a 
machine learning or clustering framework to 
predict the clinical efficacy and toxicity of the 
interrogated drug based on clinical outcomes of 
similar drugs. In the work by Barretina et al. [55] 
it was demonstrated how the Cancer Cell Line 
Encyclopedia (CCLE) can be leveraged to predict 
drug sensitivities of cell lines according to their 
genotype, facilitating application to personalized 
medicine. The CCLE includes the full transcriptomic 
screening of around 1000 cancer cell lines (fully 
characterized) and for 500 of these also includes 
the pharmacological profiling of 24 compounds. 
It is, thus, a valuable resource for personalized 
medicine. Another valuable resource in gene 
expression-driven prediction of efficacy is the 
Genomics of Drug Sensitivity in Cancer (GDSC) 
project [56], where the effects of 140 drugs were 
screened against 1200 cancer cell lines, also 
including dose response data. Moreover, Iorio 
et al. [46] presented a clustering-based approach to 
identify suitable drug repositionings, exploiting 
the cMAP repository of gene expression profiles; 
while in the work by Wessels et al. [57], a clinical 
pharmacogenetic model was developed based on 
custom data to predict the efficacy of methotrexate 
in rheumatoid arthritis, further demonstrating 
how genomic data can be leveraged to obtain 
robust predictions of drug efficacy. Apart from 
genomic data, Xiang-Qun Xie [58] demonstrated 
how chemoproteomic and structural data in 
pubChem can be used for virtual screening purposes, 
including the prediction of efficacy and toxicity. 

Regarding mechanistic driven methodologies, 
these include the construction of predictive models 
(mostly network models) to either describe the 
signaling, metabolic or transcriptomic processes 
in the cell type/ tissue of interest and how these 
are altered in disease; or correlate the drug target 
with key efficacy and toxicity mediators. On this 
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front, the work by Klipp et al. [59], Folger et al. 
[60] and Kim et al. [61] demonstrates how 
signaling and metabolic network models can be 
used for effective drug targeting. In another 
interesting work by Hwang et al. [62], the authors 
presented a systems approach to tackling prion 
disease, also suggesting possible therapeutic 
approaches. Regarding toxicity studies, Cosgrove 
et al. [63], leveraged cytokine release measure- 
ments to associate toxicity in human hepatocytes 
with signaling network dysregulation. 

This type of analysis if applied in the clinical 
phase in conjunction with standard PD modeling, 
could provide investigators with valuable 
predictions on the clinical efficacy and toxicity of 
the interrogated drug, as well as feedback to 
research for the identification of new targets. 

Conclusion 

Computational modeling offers an unmatched 
solution for the interpretation of extensive 
datasets, increasingly becoming the norm in drug 
development, as proteomic technologies generate 
high content data with minimum sample 
requirements. Unfortunately, the community still 
seems skeptical to apply elaborate modeling 
methodologies, such as those applied in the 
discovery phase of drug development into the 
clinic. Here we presented a number of systems 
biology methodologies addressing: (i) the model- 
ing of signaling pathways, (ii) the identification 
of drug MOA and (iii) the prediction of clinical 
drug efficacy and toxicity which are currently 
applied in the discovery phase that could be 
leveraged within a systems pharmacology frame- 
work with great benefits. 
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